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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed 12/2/05 have been fully read and considered but they 
are not persuasive. 

Regarding lines 10-11,1 5-1 6 and 1 8-20 on page 1 3 of applicant's remarks about 
claim 23, applicant asserts that Lee does not teach the limitation of "adding" based on "if 
a threshold number of feature points are identified in the next frame, adding the next 
frame, adding the next frame to the first segment. The examiner respectfully disagrees. 

Jain teaches manually adjusting the number of key frames, where the number is 
one key frame for every thirty frames, ie. a segment (col .23, ln.64 to col.24, ln.3). 
Therefore, since Jain teaches manually adjusting one key frame or representative frame 
for every thirty frames, it would have been obvious to one of ordinary skill in the art to 
manually change the number of key (representative) frames per segment from 
anywhere between two to five key or representative frames per segment if necessary 
for accurately enhancing the three-dimensional representation of the targeted scene. 
Also, in column 2, line 65 to column 3, line 31, Lee teaches the determining whether a 
threshold number of feature points from base frame are identified in the second frame 
by using threshold values TH and comparison of threshold values of feature points 
between the current frame and the reference frame to check if the threshold is 
exceeded, and that Lee also teaches that if a threshold number of feature points are 
identified in the second frame, adding the second frame to the segment. In figure 3, 
Lee suggests the cyclical process of determination of the threshold number values. 
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Thus, Lee teaches the repeating the analyzing step, determining step and adding step 
for subsequent frames until the number of feature points in a frame falls below the 
threshold number. 

Therefore, it would have been obvious to one of ordinary skill in the art to 
combine the teachings of Jain and Lee, as a whole, for improving the encoding of video 
image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner, as disclosed in Lee's 
column 2, lines 60-64. 

Dependent claims 24-30 are rejected for at least similar reasons as stated for 
claim 23. 

Dependent claims 17 and 20 are rejected for at least similar rationale as stated 
for claim 23. 

Independent claim 1 is now rejected under 35 U.S.C. 103 In view of Jain and Lee 
since claim 1 now comprises some similar limitations as mentioned in claim 23. See 
above paragraphs and the rejection below. 

Dependent claims 2 and 4-7 are rejected for at least similar reasons as claim 1 . 
For instance, in column 23, line 58 to column 24, line 3, Jain discloses the use of 
extracting of key frames by selecting one key frame from every 30 frames. Also peruse 
the rejection below. 

Regarding lines 11-17 on page 19 of applicant's remarks about claim 9, applicant 
argues that Jain does not disclose "... for each segment, encoding the frames In the 
segment into at least two virtual key frames that include a three-dimensional structure 
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for the segment and an uncertainty associated with the segment". The examiner 
respectfully disagrees. In column 23, line 58 to col.24, line 3, Jain discloses extracting 
virtual key frames by choosing one key frame from every 30 frames in that every 30 
frames can be considered a segment of a sequence of frames. Also, in column 24, 
lines 38-67, Jain discloses using virtual key frames for obtaining the best possible three- 
dimensional reconstruction of the two-dirriensional frame data in that if there is not 
enough known points, ie. uncertainty, from virtual key frames. Thus, the segmented 
frames are encoded into at least two virtual key frames to ascertain the best, possible 
three-dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization. In fig.8, camera 1 obtains a sequence of 412 frames for approximately 13 
seconds, and that every 30 frames obtained for each second at the standard NTSC 
frame rate (30 frames/sec), where the virtual key frames are extracted in preparation for 
image recording and encoding sent for processing to be viewed at the display terminal. 
Thus, Jain meets the broad limitations of claim 9. 

Dependent claims 10-22 are rejected for at least similar reasons as claim 9. 

Regarding lines 2-4 on page 21 of applicant's remarks about claim 36, applicant 
contends that Jain does not specifically disclose "calculating a partial model for each 
segment that includes three-dimensional coordinates and camera pose for features 
within the frames" and "extracting virtual key frames from each partial model". The 
examiner respectfully disagrees. In figure 12, Jain discloses that there are multiple 
"image to ground projection" sections that are used for calculating and projecting an 
image or a partial model for each segment of that includes three-dimensional 



Application/Control Number: 09/338,176 Page 5 

Art Unit: 2613 

occupancy estimation for wfiich a 3D map of is generated in an attempt to form a 
dynamic model. In column 21 , line 63 to column 22, line 7, Jain discloses the obtaining 
of the feature points within the frames. And in column 22, line 62 to column 23, line 56, 
Jain discloses the equations including three dimensional coordinates (x, y, z) along with 
camera position or pose, camera angle and camera parameter to obtain a partial model 
or a "image to ground projection". Thus, Jain discloses "calculating a partial model for 
each segment that includes three-dimensional coordinates and camera pose for 
features within the frames". 

In column 23, line 58 to column 24, line 3, Jain teaches extracting key frames by 
selecting one key frame from every 30 frames in that every 30 frames can be 
considered a segment of a sequence of frames. And also, in column 24, line 38-67, 
Jain discloses the key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie. uncertainty, from key frames, estimates or bundle adjustments were made to 
ascertain the best, possible three-dimensional reconstruction of the two-dimensional 
frame data to yield the 3D visualization. Thus, Jain discloses "extracting virtual key 
frames from each partial model". 

Regarding lines 1-5 on page 23 of applicant's remarks about claim 37, applicant 
argues that Jain does not disclose "calculating a partial model for each segment..." and 
"extracting virtual key frames from each partial model". The examiner respectfully 
disagrees. Claim 37 is rejected for at least similar reasons as claim 36. 
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Regarding lines 2-4 and 10-14 on page 24 of applicant's remarks about claim 31, 
applicant mentions that Jain does not explicitly suggest "dividing a long sequence of 
frames into segments and reducing the number of frames in each segment by 
representing the segments using between two and five representative frames per 
segment", and that "manual selection" is not the same as "manual adjustment". The 
examiner respectfully disagrees. In figure 8, Jain discloses camera 1 captures a 
sequence of 412 frames for approximately 13 seconds, and that every 30 frames 
obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), can be 
considered a segment, so in this case, camera 1 has approximately 14 segments, thus, 
Jain discloses the division of the sequence of images into segments. Also, Jain states 
that the extraction of key frames by selecting one key frame from every 30 frames (ie. a 
segment of a sequence of frames), as disclosed in column 23, line 58 to column 24, line 
3. Clearly, Jain discloses there are segments within a sequence of frames, othenA/ise, 
the ascertainment of key frames would not be possible without these segments, where 
each segment is formed from a sequence of 30 frames. 

Also, in column 23, line 64 to column 24, line 3, Jain discloses manually adjusting 
the number of key frames, where the number (segment) is one key frame for every thirty 
frames. Therefore, since Jain teaches the manual adjustment of one key frame or 
representative frame for every thirty frames, it would have been reasonably obvious to 
one of ordinary skill in the art to manually change the number of key (representative) 
frames per segment from anywhere between two to five key or representative frames 
per segment if necessary for accurately enhancing the three-dimensional representation 
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of the targeted scene. Furthermore, the term "manual adjustment" can imply many 
things and that "manual selection" can be interpreted as one form of "manual 
adjustment" since selection is a type of adjustment. 

Dependent claims 32-35 are rejected for at least similar reasons as claim 31 . 

Thus, the rejection is maintained. 

Regarding applicant's request for interview, the applicant is invited to 
telephonically request an interview if there be any issues with regard to this Office 
Action that applicant may feel deemed necessary to further support their position in 
addition to the amendment filed 12/2/05. Otherwise, a written after final response 
should be fully adequate to state the applicant's position on this case. 

Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

2. Claims 9-16, 18, 19, 21, 22, 36 and 37 are rejected under 35 U.S.C. 102(b) as 
being anticipated by Jain et al (5,729,471). 

Regarding claims 9 and 21 , discloses a method of recovering a three- 
dimensional scene from two-dimensional images, the method comprising: 

identifying a sequence of two-dimensional frames that include two-dimensional 
images (fig. 12, note camera 1 obtain video images in two-dimensional form; also see 
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fig.8, note camera 1 obtains a sequence of two-dimensional images, and cameras 2 
and 3 also obtain a con^esponding sequence of images; col.22, In. 1-3); 

dividing the sequence of images into segments, wherein a segment includes a 
plurality of frames (fig.8, note that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a segment, so in this 
case, camera 1 has approximately 14 segments, thus, Jain discloses the division of the 
sequence of inrages into segments; also, in col .23, ln.58 to col.24, ln.3; Jain discloses 
the extraction of key frames by selecting one key frame from every 30 frames, Ie. a 
segment of a sequence of frames, clearly, Jain discloses there are segments within a 
sequence of frames, othenA/ise, the ascertainment of keyframes would not be possible 
without these segments, where each segment is formed from a sequence of 30 frames); 

for each segment, encoding the frames in the segment into at least two virtual 
frames that include a three-dimensional structure for the segment and an uncertainty 
associated with the segment (col. 23, ln.58 to col.24, ln.3; Jain discloses the extraction 
of virtual key frames by selecting one key frame from every 30 frames in that every 30 
frames can be considered a segment of a sequence of frames; also, col.24, ln.38-67, 
Jain discloses the virtual key frames are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
enough known points, ie. uncertainty, from virtual keyframes, thus, segmented frames 
are encoded into at least two virtual key frames to ascertain the best, possible three- 
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dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Regarding claim 10, Jain discloses the identifying the base frame, identifying the 
feature points in the base frame, and defining the segments (col.21, ln.63 to col .22, ln.7; 
Jain discloses the identification of feature points in the plural frames that includes the 
first base frame in the segments from the sequence of images). 

Regarding claim 1 1 , Jain discloses the variation of segments and variation of 
frames (fig.8, note camera 1 has multiple 413 frames in approximately 13 seconds, 
where each segment has 30 frames to obtain approximately 13 segments from camera 
1 , whereas camera 2 has 1 81 frames in 6 seconds, or approximately 6 segments from 
camera 2, etc.). 

Regarding claim 12, Jain discloses identify feature points, estimating three 
dimensional coordinates, and estimating camera rotation and translation (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col .22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col .22, ln.62 to col .23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 
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Regarding claims 13-16, Jain discloses the performance of a two-frame structure 
from motion algorithm on each of the segments to create a partial model (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21 , ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); and eliminating ambiguity (col. 24, ln.38-67, Jain discloses 
the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie. uncertainty, from virtual key frames, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization). 

Regarding claim 18, Jain discloses extracting virtual keyframes (col.23, ln.58 to 
col .24, ln.3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames in that every 30 frames can be considered a segment of a sequence of 
frames; also, col.24, ln.38-67, Jain discloses the key frames are used to obtain the best 
possible three-dimensional reconstruction of the two-dimensional frame data in that if 
there is not enough known points, ie. uncertainty, from key frames, estimates or bundle 
adjustments were made to ascertain the best, possible three-dimensional reconstruction 
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of the two-dimensional frame data to yield the 3D visualization) and bundle adjustment 
of key frames (fig. 12, note the "3D visualization" section is the product of the adjusting 
of the virtual key frames to produce a complete three-dimensional reconstruction of the 
two dimensional frames obtained by video camera 1 to video camera N; also, col .24, 
ln.38-67, Jain discloses the key frames are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
enough known points from key frames, estimates or bundle adjustments were made to 
ascertain the best, possible three-dimensional reconstruction of the two-dimensional 
frame data to yield the 3D visualization). 

Regarding claim 19, Jain discloses performing motion estimation to identify 
feature points (col.21 , ln.63 to col.22, ln.7). 

Regarding claim 22, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 9 (col. 15, ln.65-67). 

Regarding claim 36, Jain discloses a computer-readable medium having 
computer-executable instructions for performing a method comprising: 

providing a sequence of two-dimensional frames (fig. 12, note camera 1 obtain 
video images in two-dimensional form; also see fig.8, note camera 1 obtains a 
sequence of two-dimensional images, and cameras 2 and 3 also obtain a corresponding 
sequence of images; col.22, In. 1-3); 

dividing the sequence into segments (fig.8, note that camera 1 obtains a 
sequence of 41 2 frames for approximately 1 3 seconds, and that every 30 frames 
obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), can be 
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considered a segment, so in this case, camera 1 has approximately 14 segments, thus, 
Jain discloses the division of the sequence of images into segments; also, in col .23, 
ln.58 to col.24, ln.3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames, ie. a segment of a sequence of frames, clearly, Jain 
discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

calculating a partial model for each segment that includes three-dimensional 
coordinates and camera pose for features within the frames (fig. 12, note there are 
multiple "image to ground projection" sections that are used to calculate and project an 
image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col .22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col. 22, ln.62 to col .23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); 

extracting virtual keyframes from each partial model, the virtual keyframes 
having three-dimensional coordinates for the frames and an uncertainty associated with 
the frames (col.23, ln.58 to col.24, ln.3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames in that every 30 frames can be 
considered a segment of a sequence of frames; also, col.24, in.38-67, Jain discloses 
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the key frames are used to obtain ttie best possible three-dimensional reconstruction of 
the two-dimensional frame data in that if there is not enough known points, ie. 
uncertainty, from key frames, estimates or bundle adjustments were made to ascertain 
the best, possible three-dimensional reconstruction of the two-dimensional frame data 
to yield the 3D visualization); and 

bundle adjusting the virtual key frames to obtain a complete three-dimensional 
reconstruction of the two-dimensional frames (fig. 12, note the "3D visualization" section 
is the product of the adjusting of the virtual key frames to produce a complete three- 
dimensional reconstruction of the two dimensional frames obtained by video camera 1 
to video camera N; also, col.24, ln.38-67, Jain discloses the keyframes are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points from key frames, estimates or bundle 
adjustments were made to ascertain the best, possible three-dimensional reconstruction 
of the two-dimensional frame data to yield the 3D visualization). 

Regarding claim 37, Jain discloses an apparatus for recovering a three- 
dimensional scene from a sequence of two-dimensional frames by segmenting the 
frames, comprising: 

means for capturing two-dimensional images (fig.12, note camera 1 obtain video 
images in two-dimensional form; also see fig.8, note camera 1 obtains a sequence of 
two-dimensional images, and cameras 2 and 3 also obtain a corresponding sequence of 
images; col.22. In. 1-3); 
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means for dividing the sequence into segments (fig.8, note that camera 1 obtains 
a sequence of 412 frames for approximately 13 seconds, and that every 30 frames 
obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), can be 
considered a segment, so in this case, camera 1 has approximately 14 segments, thus, 
Jain discloses the division of the sequence of images into segments; also, in col.23, 
ln.58 to col.24, ln.3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames, ie. a segment of a sequence of frames, clearly, Jain 
discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

means for calculating a partial model for each segment that includes three- 
dimensional coordinates and camera pose for features within the frames (fig.1 2, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); 

means for extracting virtual key frames from each partial model (col.23, ln.58 to 
col.24, ln.3; Jain discloses the extraction of key frames by selecting one key frame from 
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every 30 frames in that every 30 frames can be considered a segment of a sequence of 
frames); and 

means for bundle adjusting the virtual keyframes to obtain a complete three- 
dimensional reconstruction of the two-dimensional frames (fig. 12, note the "3D 
visualization" section is the product of the adjusting of the virtual key frames to produce 
a complete three-dimensional reconstruction of the two dimensional frames obtained by 
video camera 1 to video camera N; also, col,24, ln.38-67, Jain discloses the keyframes 
are used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1-2, 4-8, 17, 20 and 23-30 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Jain et al (5,729,471) in view of Lee (5,612,743). 

Regarding claim 1 , Jain discloses a method of recovering a three-dimensional 
scene from two-dimensional images, the method comprising: 
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providing a sequence of frames (fig. 12, note camera 1 obtain video images in 
two-dimensional form; also see fig.8, note camera 1 obtains a sequence of two- 
dimensional images, and cameras 2 and 3 also obtain a corresponding sequence of 
images; col.22. In. 1-3); 

dividing the sequence of frames into frame segments wherein the frames in the 
sequence comprise feature points and wherein dividing the sequence of frames into 
frame segments (fig.8, note that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a frame segment, so in 
this case, camera 1 has approximately 14 segments, thus, Jain discloses the division of 
the sequence of images into segments; also, in col.23, ln.58 to col .24, ln.3; Jain 
discloses the extraction of key frames by selecting one key frame from every 30 frames, 
ie. a segment of a sequence of frames, clearly, Jain discloses there are segments within 
a sequence of frames, otherwise, the ascertainment of key frames would not be 
possible without these segments, where each segment is formed from a sequence of 30 
frames; also fig. 12, note there are multiple "image to ground projection" sections that 
are used to calculate and project an image or a partial model for each segment of that 
includes three-dimensional occupancy estimation for which a 3D map of is generated in 
an attempt to form a dynamic model; col.21, ln.63 to col.22, ln.7, Jain discloses the 
obtaining of the feature points within the frames; col.22, ln.62 to col.23, ln.56, Jain 
discloses the use of equations that includes three dimensional coordinates (x, y, z) that 
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includes camera position or pose, camera angle and camera parameter to obtain a 
partial model or a "image to ground projection"); 

performing three-dimensional reconstruction individually for each frame segment 
derived by dividing the sequence of frames (fig.12, note there are multiple "image to 
ground projection" sections that are used to calculate and project an image or a partial 
model for each segment of that includes three-dimensional occupancy estimation for 
which a 3D map of is generated in an attempt to form a dynamic model; col.21, ln.63 to 
col.22, ln.7, Jain discloses the obtaining of the feature points within the frames; col.22, 
ln.62 to col .23, ln.56, Jain discloses the use of equations that includes three 
dimensional coordinates (x, y, z) that includes camera position or pose, camera angle 
and camera parameter to obtain a partial model or a "image to ground projection"); and 

combining the three-dimensional reconstructed segments together to recover a 
three-dimensional scene for the sequence of images (fig.12, note the "3D visualization" 
section is the product of the adjusting of the virtual key frames to produce a complete 
three-dimensional reconstruction of the two dimensional frames obtained by video 
camera 1 to video camera N; also, col.24, ln.38-67, Jain discloses the key frames are 
used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from keyframes, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 
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Jain does not specifically disclose the determining whether a threshold number of 
feature points being tracked between the frames of the frame segments. However, Lee 
teaches the determining whether a threshold number of feature points being tracked 
between the frames of the frame segments (col.2, ln.65 to col.3, ln.31 ; Lee teaches the 
use of threshold values TH and comparison of threshold values of feature points 
between the current frame and the reference frame to check if the threshold is 
exceeded). Therefore, it would have been obvious to one of ordinary skill in the art to 
combine the teachings of Jain and Lee, as a whole, for improving the encoding of video 
image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner (col.2, ln.60-64). 

Regarding claim 2, Jain discloses the use of virtual key frames (col.23, ln.58 to 
col.24, ln.3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames, ie. a segment of a sequence of frames). 

Regarding claim 4, Jain discloses the performance of a two-frame structure from 
motion algorithm on each of the segments to create a partial model (fig.12, note there 
are multiple "image to ground projection" sections that are used to calculate and project 
an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col .21 , ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
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position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); and eliminating ambiguity (col. 24, ln.38-67, Jain discloses 
the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie, uncertainty, from virtual key frames, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization). 

Regarding claims 5 and 7, Jain discloses extracting virtual keyframes (col.23, 
ln.58 to coL24, In. 3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames in that every 30 frames can be considered a segment of a 
sequence of frames; also, col.24, ln.38-67, Jain discloses the key frames are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points, ie. uncertainty, from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization) and bundle adjustment of key frames (fig. 12, note the "3D visualization" 
section is the product of the adjusting of the virtual key frames to produce a complete 
three-dimensional reconstruction of the two dimensional frames obtained by video 
camera 1 to video camera N; also, col.24, ln.38-67, Jain discloses the key frames are 
used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
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dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Regarding claim 6, Jain discloses identify feature points, estimating three 
dimensional coordinates, and estimating camera rotation and translation (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col .21, In. 63 to col .22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, In. 56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 

Regarding claim 8, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 1 (col. 15, ln.65-67). 

Regarding claims 17, 23, 24 and 28, Jain discloses a method of recovering a 
three-dimensional scene from a sequence of two-dimensional frames, comprising: 

identifying at least a first base frame in a sequence of two dimensional frames 
(fig. 12, note camera 1 obtain video images in two-dimensional form; also see fig.8, note 
camera 1 obtains a sequence of two-dimensional images, and cameras 2 and 3 also 
obtain a corresponding sequence of images; see col.22, In. 1-3; fig.8, note that camera 1 
obtains a sequence of 412 frames for approximately 1 3 seconds, and that every 30 
frames obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), 
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can be considered a segment, so in this case, camera 1 has approximately 14 
segments, thus, Jain discloses the division of the sequence of images into segments; 
also, in col.23, ln.58 to col.24, ln.3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames, ie. a segment of a sequence of frames, 
clearly, Jain discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

adding the at least first base frame to create a first segment of the sequence 
(fig. 8, note that camera 1 obtains a sequence of 412 frames for approximately 13 
seconds, and that every 30 frames obtained for each second, ie. the standard NTSC 
frame rate (30 frames/sec), can be considered a frame segment, so in this case, 
camera 1 has approximately 14 frame segments, so a first segment of the sequence is 
created); 

identifying feature points in at least a first base frame in a first segment (col .21, 
In. 63 to col .22, ln.7; Jain discloses the identification of feature points in the plural frames 
that includes the first base frame); and 

analyzing a second frame in the segment to identify the feature points in the 
second frame (col.21 , ln.63 to coL22, ln.7; Jain discloses the identification of feature 
points in each frame from a plurality of frames that includes the second frame). 

Jain does not specifically disclose the adding the second frame to the segment. 
However, Jain discloses the manual adjustment of the number of key frames, where the 
number is one key frame for every thirty frames, ie. a segment (col.23, ln.64 to col.24, 
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In. 3). Therefore, since Jain teaches the nnanual adjustment of one key frame or 
representative frame for every thirty frames, it would have been obvious to one of 
ordinary skill in the art to manually change the number of key (representative) frames 
per segment from anywhere between two to five key or representative frames per 
segment if necessary for accurately enhancing the three-dimensional representation of 
the targeted scene. 

Jain does not specifically disclose the determining whether a threshold number of 
feature points from base frame are identified in the second frame; if a threshold number 
of feature points are identified in the second frame, adding the second frame to the 
segment; and repeating the analyzing step, determining step and adding step for 
subsequent frames until the number of feature points in a frame falls below the 
threshold number. However, Lee teaches the determining whether a threshold number 
of feature points from base frame are identified in the second frame (col .2, ln.65 to 
col. 3, ln.31 ; Lee teaches the use of threshold values TH and comparison of threshold 
values of feature points between the current frame and the reference frame to check If 
the threshold is exceeded); if a threshold number of feature points are identified in the 
second frame, adding the second frame to the segment (col.2, ln.65 to col.3, ln.31); and 
repeating the analyzing step, determining step and adding step for subsequent frames 
until the number of feature points in a frame falls below the threshold number (fig.3, 
note Lee discloses the process is cyclical and repetitive, thus the analysis, 
determination and addition steps are repeated). Therefore, it would have been obvious 
to one of ordinary skill in the art to combine the teachings of Jain and Lee, as a whole, 
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for improving the encoding of video image data so as to accurately encode images via 
the selection of feature points according to the motion of objects in a financially robust 
manner (col .2, In. 60-64). 

Regarding claim 20, Jain does not specifically disclose creating a template block 
in a first frame, creating a search window used in the second frame, and comparing an 
intensity difference between the search window and the template block to locate the 
feature point in the second frame. However, Lee teaches that creating a template block 
in a first frame, creating a search window used in the second frame, and comparing an 
intensity difference between the search window and the template block to locate the 
feature point in the second frame (fig.4, note frame A and frame B are the first and 
second frames, note fig.3, element 313 also discloses the comparison process to 
compare differences to determine or locate the feature point in the second frame). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the 
teachings of Jain and Lee, as a whole, for improving the encoding of video image data 
so as to accurately encode images via the selection of feature points according to the 
motion of objects in a financially robust manner (col.2, ln.60-64). 

Regarding claim 25, Jain discloses performing motion estimation to identify 
feature points (col .21, In. 63 to col .22, In. 7). 

Regarding claim 26, Jain discloses the identification of corners as feature points 
(col .22, In. 15-22; note the disclosure of borders, hashlines, marks are feature points to 
create corners as to determine camera status and pose). 
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Regarding claim 27, Jain discloses the number of frames can vary between 
segments (col .23, ln.64 to col .24, In. 3). 

Regarding claim 29, Jain discloses the bundle adjustment of key frames (fig. 12, 
note the "3D visualization" section is the product of the adjusting of the virtual key 
frames to produce a complete three-dimensional reconstruction of the two dimensional 
frames obtained by video camera 1 to video camera N; also, col.24, ln.38-67, Jain 
discloses the key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points from key frames, estimates or bundle adjustments were made to ascertain the 
best, possible three-dimensional reconstruction of the two-dimensional frame data to 
yield the 3D visualization). 

Regarding claim 30, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 23 (col. 15, ln.65-67). 

Claim 31-35 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Jain etal (5,729,471). 

Regarding claim 31, Jain discloses a method of recovering a three-dimensional 
scene from a sequence of two-dimensional frames (fig. 12), an improvement comprising: 

dividing a long sequence of frames into segments (fig.8, note that camera 1 
obtains a sequence of 412 frames for approximately 1 3 seconds, and that every 30 
frames obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), 
can be considered a segment, so in this case, camera 1 has approximately 14 
segments, thus, Jain discloses the division of the sequence of images into segments; 
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also, in col.23, ln.58 to col .24, ln.3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames, ie. a segment of a sequence of frames, 
clearly, Jain discloses there are segments within a sequence of frames, othenwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment Is formed from a sequence of 30 frames), 

wherein the representative frames are used to recover the three-dimensional 
scene and remaining frames are discarded so that three-dimensional scene is 
effectively compressed (col.23, ln.58 to col. 24, ln.3; Jain discloses the extraction of 
virtual key frames by selecting one key frame from every 30 frames in that every 30 
frames can be considered a segment of a sequence of frames; also, col.24, ln.38-67, 
Jain discloses the virtual keyframes are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
enough known points, ie. uncertainty, from virtual keyframes, thus, segmented frames 
are encoded into at least two virtual key frames to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization, the excess remaining frames are discarded). 

Jain does not specifically disclose the reducing the number of frames in each 
segment by representing the segments using between two and five representative 
frames per segment. However, Jain discloses the manual adjustment of the number of 
key frames, where the number is one key frame for every thirty frames, Ie. a segment 
(col.23, ln.64 to col.24, ln.3). Therefore, since Jain teaches the manual adjustment of 
one key frame or representative frame for every thirty frames, it would have been 
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obvious to one of ordinary skill in the art to manually change the number of key 
(representative) frames per segment from anywhere between two to five key or 
representative frames per segment if necessary for accurately enhancing the three- 
dimensional representation of the targeted scene. 

Regarding claim 32, Jain discloses that each representative frame have an 
associated uncertainty (col .24, ln.38-67, Jain discloses the key frames are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points, ie. uncertainty, from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Regarding claim 33, Jain discloses the long sequence of frames includes over 75 
frames (fig.8, note that camera 1 obtains a sequence of 412 frames, which clearly is 
over 75 frames). 

Regarding claim 34, Jain discloses the division of the long sequence into 
segments and tracking feature points (fig.8, note that camera 1 obtains a sequence of 
412 frames for approximately 1 3 seconds, and that every 30 frames obtained for each 
second, ie. the standard NTSC frame rate (30 frames/sec), can be considered a 
segment, so in this case, camera 1 has approximately 14 segments, thus, Jain 
discloses the division of the sequence of images into segments; also, in col.23, ln.58 to 
col.24. In. 3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames, ie. a segment of a sequence of frames, clearly, Jain discloses there 
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are segments within a sequence of frames, otherwise, the ascertainment of i<ey frames 
would not be possible without these segments, where each segment is formed from a 
sequence of 30 frames; col.21 , ln.63 to col .22, ln.7, Jain discloses the obtaining of the 
feature points within the frames). 

Regarding claim 35, Jain discloses the performance of a two-frame structure 
from motion algorithm on each of the segments to create a partial model (fig.12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 

Conclusion 

5. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
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extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 
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